Python is a popular, open-source programming language used for both scripting applications and standalone programs. Python can be used to do pretty much anything. For example, you can use Python as a calculator. Position your cursor in the code cell below and hit [shift][enter]. The output should be 12.
In [1]:
6 * 2
Out[1]:
The goal of the extra space to make the code more (visually) readable.
2 * 3 works just as well as 2*3.
When you are programming, you want to store your values in variables
In [2]:
a = 6
b = 2
a * b
Out[2]:
Both a and b are now VARIABLES. Each variable has a type. In this case, they are both INTEGERS (whole numbers). To write the value of a variable to the screen, use the print statement.
The last statement of a code cell is automatically printed to the screen if it is not stored in a variable, as was shown above.
In [3]:
print a
print b
print a * b
print a / b
You can add some text to the print statement by putting the text between quotes (either single or double quotes work as long as you use the same at the beginning and end), and separate the text string and the variable by a comma
In [4]:
print 'the value of a is', a
A variable can be raised to a power by using **
(a hat ^, as used in some other languages, doesn't work).
In [5]:
a**b
Out[5]:
In [ ]:
Important insight on division: when dividing two variables, you have to be careful as division of two integers results in an integer. So 1/3 gives 0. If one of the variables is a floating point variable (simply called a float), which has a decimal point, then the result is also a float and gives the result you want (0.33333...).
This rule for integer division may be annoying, and you will likely make a mistake with it sometime.
In [6]:
print '1/3 gives', 1 / 3
print '1.0 / 3 gives', 1.0 / 3
print '1 / 3.0 gives', 1 / 3.0
print '1.0 / 3.0 gives', 1.0 / 3.0
Once you have created a variable in an Jupyter session, it will remain in memory, so you can use it in other cells as well. For example, the variable a, which was defined earlier in this Notebook, still exist. It will be 6 unless you changed it in Exercise 1.
In [7]:
print 'a:', a
The user decides the order in which code blocks are executed. For example, In [6] means that it is the sixth execution of a code block. If you change the same code block and run it again, it will get number 7. If you define the variable a in code block 7, it will overwrite the value of a defined in a previous code block.
Variable names may be as long as you like (you gotta do the typing though). Selecting descriptive names aids in understanding the code. Variable names cannot have spaces, nor can they start with a number. And variable names are case sensitive. So the variable a is not the same as the variable A. The name of a variable may be anything you want, except for reserved words in the Python language. For example, it is not possible to create a variable print = 7, as print is a reserved word. You will learn many of the reserved words when we continue; they are colored green when you type them in the Notebook.
Plotting is not part of standard Python. Luckily, a package exist to create beautiful graphics. A package is a library of functions for a specific set of tasks. There are many Python packages and we will use several of them.
The graphics package is called matplotlib. To be able to use the plotting functions in matplotlib we have to import it.
matplotlib and call it plt.
In [8]:
import matplotlib.pyplot as plt
%matplotlib inline
Packages only have to be imported once in an IPython session. After the import, any plotting function may be called from any code cell as plt.function. For example
In [9]:
plt.plot([1,2,3,2])
Out[9]:
Let's try to plot $y$ vs $x$ for $x$ going from $-4$ to $4$ for the polynomial in the exercise above. To do that, we need to evaluate $y$ at a bunch of points.
A sequence of values of the same type is called an array (for example an array of integers or floats).
Array functionality is available in the package numpy. Let's import numpy and call it np, so that any function in the numpy package may be called as np.function.
In [10]:
import numpy as np
To create an array x consisting of, for example, 5 equally spaced points between -4 and 4 use the linspace command
In [11]:
x = np.linspace(-4, 4, 5)
print x
In the above cell, x is an array of 5 floats (-4. is a float, -4 is an integer).
If you type np.linspace and then an opening parenthesis like:
np.linspace(
and then hit [shift-tab] a little help box pops up to explain the input arguments of the function. When you click on the + sign, you can scroll through all the documentation of the linspace function. Click on the x sign to remove the help box. Let's plot $y$ using 100 $x$ values from
$-4$ to $4$.
In [12]:
a = 1
b = 1
c = -6
x = np.linspace(-4, 4, 100)
y = a*x**2 + b*x + c # Compute y for all x values
plt.plot(x, y)
Out[12]:
Note that one hundred y values are computed in the simple line y = a*x**2 + b*x + c. The text after the # is a comment in the code. Any text on the line after the # is ignored by Python. Python treats arrays in the same fashion as it treats regular variables when you perform mathematical operations. The math is simply applied to every value in the array (and it runs much faster than when you would do every calculation separately).
You may wonder what the statement [<matplotlib.lines.Line2D at 0x30990b0>] is (the numbers on your machine may look different). This is actually a handle to the line that is created with the last command in the code block (in this case plt.plot(x,y)). You can tell the Notebook not to print this to the screen by putting a semicolon after the last command in the code block (so type plot(x,y);). We will learn later on that it may also be useful to store this handle in a variable.
The plot function can take many arguments. Looking at the help box of the plot function (by typing plt.plot( and then shift-tab) gives you a lot of help. Typing plt.plot? gives a new scrollable subwindow at the bottom of the notebook, showing the documentation on plot. Click the x in the upper right hand corner to close the subwindow again.
In short, plot can be used with one argument as plot(y), which plots y values along the vertical axis and enumerates the horizontal axis starting at 0. plot(x,y) plots y vs x, and plot(x,y,formatstring) plots y vs x using colors and markers defined in formatstring, which can be a lot of things. It can be used to define the color, for example 'b' for blue, 'r' for red, and 'g' for green. Or it can be used to define the linetype '-' for line, '--' for dashed, ':' for dots. Or you can define markers, for example 'o' for circles and 's' for squares. You can even combine them: 'r--' gives a red dashed line, while 'go' gives green circular markers.
If that isn't enough, plot takes a large number of keyword arguments. A keyword argument is an optional argument that may be added to a function. The syntax is function(keyword1=value1, keyword2=value2), etc. For example, to plot a line with width 6 (the default is 1; also note the semi-colon at the end)
In [13]:
plt.plot([1, 2, 3], [2, 4, 3], linewidth=6);
Keyword arguments should come after regular arguments. plot(linewidth = 6, [1, 2, 3], [2, 4, 3]) gives an error.
Names may be added along the axes with the xlabel and ylabel functions, e.g., plt.xlabel('this is the x-axis'). Note that both functions take a string as argument. A title can be added to the figure with the plt.title command. Multiple curves can be added to the same figure by giving multiple plotting commands. They are automatically added to the same figure.
Plot $y=(x+2)(x-1)(x-2)$ for $x$ going from $-3$ to $3$ using a dashed red line. On the same figure, plot a blue circle for every point where $y$ equals zero. Set the size of the markers to 10 (you may need to read plt.plot? to find out how to do that). Label the axes as 'x-axis' and 'y-axis'. Add the title 'First nice Python figure of Your Name', where you enter your own name.
In [ ]:
You are provided with the data files containing the mean montly temperature of Holland, New York City, and Beijing. The Dutch data is stored in holland_temperature.dat, and the other filenames are similar. Plot the temperature for each location against the number of the month (starting with 1 for January) all in a single graph. Add a legend by using the function plt.legend(['line1','line2']), etc., but then with more descriptive names. Find out about the legend command using plt.legend?. Place the legend in an appropriate spot (the upper left-hand corner may be nice, or let Python figure out the best place).
In [ ]:
Load the average monthly air temperature and seawater temperature for Holland. Create one plot with two graphs above each other using the subplot command (use plt.subplot?). On the top graph, plot the air and sea temperature. Label the ticks on the horizontal axis as 'jan', 'feb', 'mar', etc., rather than 0,1,2,etc. Use plt.xticks? to find out how. In the bottom graph, plot the difference between the air and seawater temperature. Add legends, axes labels, the whole shebang.
In [ ]:
The plotting package matplotlib allows you to make very fancy graphs. Check out the matplotlib gallery to get an overview of many of the options. The following exercises use several of the matplotlib options.
At the 2012 London Olympics, the top ten countries (plus the rest) receiving gold medals were ['USA', 'CHN', 'GBR', 'RUS', 'KOR', 'GER', 'FRA', 'ITA', 'HUN', 'AUS', 'OTHER']. They received [46, 38, 29, 24, 13, 11, 11, 8, 8, 7, 107] gold medals, respectively. Make a pie chart (type plt.pie? or go to the pie charts in the matplotlib gallery) of the top 10 gold medal winners plus the others at the London Olympics. Try some of the keyword arguments to make the plot look nice. You may want to give the command plt.axis('equal') to make the scales along the horizontal and vertical axes equal so that the pie actually looks like a circle rather than an ellipse. There are four different ways to specify colors in matplotlib plotting; you may read about it here. The coolest way is to use the html color names. Use the colors keyword in your pie chart to specify a sequence of colors. The sequence must be between square brackets, each color must be between quotes preserving upper and lower cases, and they must be separated by comma's like ['MediumBlue','SpringGreen','BlueViolet']; the sequence is repeated if it is not long enough. The html names for the colors may be found, for example, here.
In [ ]:
Load the air and sea temperature, as used in Exercise 4, but this time make one plot of temperature vs the number of the month and use the plt.fill_between command to fill the space between the curve and the $x$-axis. Specify the alpha keyword, which defines the transparancy. Some experimentation will give you a good value for alpha (stay between 0 and 1). Note that you need to specify the color using the color keyword argument.
In [ ]:
In [14]:
a = 1
b = 1
c = -6
x = -2
y = a * x**2 + b * x + c
print 'y evaluated at x=-2 is', y
x = 0
y = a * x**2 + b * x + c
print 'y evaluated at x=0 is', y
x = 2
y = a * x**2 + b * x + c
print 'y evaluated at x=2 is', y
In [15]:
x = np.linspace(-3, 3, 100)
y = (x + 2) * (x - 1) * (x - 2)
plt.plot(x, y, 'r--')
plt.plot([-2, 1, 2], [0, 0, 0], 'bo', markersize=10)
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('First Python Figure of Mark Bakker')
Out[15]:
In [16]:
holland = np.loadtxt('holland_temperature.dat')
newyork= np.loadtxt('newyork_temperature.dat')
beijing = np.loadtxt('beijing_temperature.dat')
plt.plot(np.linspace(1, 12, 12), holland)
plt.plot(np.linspace(1, 12, 12), newyork)
plt.plot(np.linspace(1, 12, 12), beijing)
plt.xlabel('Number of the month')
plt.ylabel('Mean monthly temperature (Celcius)')
plt.xlim(1, 12)
plt.legend(['Holland','New York','Beijing'], loc='best');
In [17]:
air = np.loadtxt('holland_temperature.dat')
sea = np.loadtxt('holland_seawater.dat')
plt.subplot(211)
plt.plot(air, 'b', label='air temp')
plt.plot(sea, 'r', label='sea temp')
plt.legend(loc='best')
plt.ylabel('temp (Celcius)')
plt.xlim(0, 11)
plt.xticks([])
plt.subplot(212)
plt.plot(air-sea, 'ko')
plt.xticks(np.linspace(0, 11, 12),
['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'])
plt.xlim(0, 11)
plt.ylabel('air - sea temp (Celcius)');
In [18]:
gold = [46, 38, 29, 24, 13, 11, 11, 8, 8, 7, 107]
countries = ['USA', 'CHN', 'GBR', 'RUS', 'KOR', 'GER', 'FRA', 'ITA', 'HUN', 'AUS', 'OTHER']
plt.pie(gold, labels = countries, colors = ['Gold', 'MediumBlue', 'SpringGreen', 'BlueViolet'])
plt.axis('equal');
In [19]:
air = np.loadtxt('holland_temperature.dat')
sea = np.loadtxt('holland_seawater.dat')
plt.fill_between(range(1,13), air, color='b', alpha=0.3)
plt.fill_between(range(1,13), sea, color='r', alpha=0.3)
plt.xticks(np.linspace(0, 11, 12), ['jan', 'feb', 'mar', 'apr',\
'may', 'jun', 'jul', 'aug', 'sep', ' oct', 'nov', 'dec'])
plt.xlim(1, 12)
plt.ylim(0, 20)
plt.xlabel('Month')
plt.ylabel('Temperature (Celcius)');